Combine Vector Quantization and Support Vector Machine for Imbalanced Datasets

نویسندگان

  • Ting Yu
  • John K. Debenham
  • Tony Jan
  • Simeon J. Simoff
چکیده

In cases of extremely imbalanced dataset with high dimensions, standard machine learning techniques tend to be overwhelmed by the large classes. This paper rebalances skewed datasets by compressing the majority class. This approach combines Vector Quantization and Support Vector Machine and constructs a new approach, VQ-SVM, to rebalance datasets without significant information loss. Some issues, e.g. distortion and support vectors, have been discussed to address the trade-off between the information loss and undersampling. Experiments compare VQ-SVM and standard SVM on some imbalanced datasets with varied imbalance ratios, and results show that the performance of VQ-SVM is superior to SVM, especially in case of extremely imbalanced large datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of the Efficiency of Linear and Nonlinear Models in Predicting Monthly Rainfall (Case Study: Hamedan Province)

     In this research, we used the support vector machine (SVM), support vector machine combine with wavelet transform (W-SVM), ARMAX and ARIMA models to predict the monthly values of precipitation. The study considers monthly time series data for precipitation stations located in Hamedan province during a 25-year period (1998-2016). The 25-year simulation period was divided into 17 years for t...

متن کامل

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods.  In filter methods, features subsets are selected due to some measu...

متن کامل

Common Spatial Patterns Feature Extraction and Support Vector Machine Classification for Motor Imagery with the SecondBrain

Recently, a large set of electroencephalography (EEG) data is being generated by several high-quality labs worldwide and is free to be used by all researchers in the world. On the other hand, many neuroscience researchers need these data to study different neural disorders for better diagnosis and evaluating the treatment. However, some format adaptation and pre-processing are necessary before ...

متن کامل

Mammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease

Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...

متن کامل

Modeling of Corrosion-Fatigue Crack Growth Rate Based on Least Square Support Vector Machine Technique

Understanding crack growth behavior in engineering components subjected to cyclic fatigue loadings is necessary for design and maintenance purpose. Fatigue crack growth (FCG) rate strongly depends on the applied loading characteristics in a nonlinear manner, and when the mechanical loadings combine with environmental attacks, this dependency will be more complicated. Since, the experimental inv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006